Sparse Suffix Tree Construction with Small Space

نویسندگان

  • Philip Bille
  • Inge Li Gørtz
  • Tsvi Kopelowitz
  • Benjamin Sach
  • Hjalte Wedel Vildhøj
چکیده

We consider the problem of constructing a sparse suffix tree (or suffix array) for b suffixes of a given text T of size n, using only O(b) words of space during construction time. Breaking the naive bound of Ω(nb) time for this problem has occupied many algorithmic researchers since a different structure, the (evenly spaced) sparse suffix tree, was introduced by Kärkkäinen and Ukkonen in 1996. While in the evenly spaced sparse suffix tree the suffixes considered must be evenly spaced in T , here there is no constraint on the locations of the suffixes. We show that the sparse suffix tree can be constructed in O(n log b) time. To achieve this we develop a technique, which may be of independent interest, that allows to efficiently answer b longest common prefix queries on suffixes of T , using only O(b) space. We expect that this technique will prove useful in many other applications in which space usage is a concern. Furthermore, additional tradeoffs between the space usage and the construction time are given.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sparse Suffix Tree Construction in Optimal Time and Space

Suffix tree (and the closely related suffix array) are fundamental structures capturing all substrings of a given text essentially by storing all its suffixes in the lexicographical order. In some applications, such as sparse text indexing, we work with a subset of b interesting suffixes, which are stored in the so-called sparse suffix tree. Because the size of this structure is Θ(b), it is nat...

متن کامل

Optimal Substring-Equality Queries with Applications to Sparse Text Indexing

We consider the problem of encoding a string of length n from an alphabet [0, σ − 1] so that access and substring-equality queries (that is, determining the equality of any two substrings) can be answered efficiently. A clear lower bound on the size of any prefix-free encoding of this kind is n log σ + Θ(log(nσ)) bits. We describe a new encoding matching this lower bound when σ ≤ nO(1) while su...

متن کامل

On-Line Linear-Time Construction of Word Suffix Trees

Suffix trees are the key data structure for text string matching, and are used in wide application areas such as bioinformatics and data compression. Sparse suffix trees are kind of suffix trees that represent only a subset of suffixes of the input string. In this paper we study word suffix trees, which are one variation of sparse suffix trees. Let D be a dictionary of words and w be a string i...

متن کامل

Sparse compact directed acyclic word graphs

The suffix tree of string w represents all suffixes of w, and thus it supports full indexing of w for exact pattern matching. On the other hand, a sparse suffix tree of w represents only a subset of the suffixes of w, and therefore it supports sparse indexing of w. There has been a wide range of applications of sparse suffix trees, e.g., natural language processing and biological sequence analy...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1207.1135  شماره 

صفحات  -

تاریخ انتشار 2012